Skip to main content

Create a Content Policy

Select a Moderation Behavior

After selecting Content Policy, first determine which moderation behavior your policy should follow. Content policies support: Flag, Block.

Select Policy Usage

Select input or output moderation policy based on whether the content policy will be used to guardrail user inputs or model responses.

Content Policy Setup

Provide Policy Seed Information

Provide a policy name, description and a list of allowed and disallowed behaviors. For content policies, this information will be used to generate a set of training data edge cases using APO.

For best results when create a content policy, make sure to follow the guidelines below:

  • Be Specific: Specificity in guard policies ensures clarity and minimizes room for misinterpretation, enhancing protection against misuse.
  • Provide Detailed Behaviors: Provide 5-6 allowed and disallowed behaviors. Detailed policies help users and moderators understand the exact nature of compliant and non-compliant actions, ensuring greater alignment with policy goals.

Content Policy Setup

Content Policy Setup

Prompt Review

After specifying the seed information above, DynamoGuard will provide a set of 10 initial prompts (or responses) for review, along with reasoning regarding their compliance status. Input guardrails will have a set of prompts, while output guardrails will include prompts and responses. This step will go on until (at minimum) 5 compliant and 5 non-compliant prompts (or model responses) have been marked as relevant. For each datapoint, the policy creator will be asked to determine whether the prompt or response is compliant. DynamoGuard will provide a suggested status and explanation, however the policy creator can specify the ground truth value.

Content Policy Setup

Once the initial set of datapoints has been reviewed, DynamoGuard will generate a larger training dataset based on the provided feedback. At this point, you can continue to remove or change the labels of data in this set.

Content Policy Setup

Policy Training

After the dataset has been finalized, the policy can be sent for training. DynamoGuard will use the dataset to fine-tune the lightweight guardrail model. Once the policy is created and trained, it can be deployed. To train the policy, click the train policy button in the training tab.

Content Policy Setup